Performance Issues and Solutions: SAS with Hadoop

نویسنده

  • Anjan Matlapudi
چکیده

Today many analysts in information technology are facing challenges to work with large amount of data. Most analysts are smart enough to find their way out and write quires using appropriate table join conditions, data mining, index keys and hash object for quick data retrieval. Recently SAS system has collaborated with Hadoop data storage and started providing efficient processing power to SAS analytics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automating Pharmaceutical Safety Surveillance process

Pharmaceutical companies invest huge amount of effort and cost in launching a drug into the market, keeping the two aspects in mind i.e. safety and efficacy. Post launching the drugs into the market, the organisations then need to monitor the adverse events from these drugs, and need to take the action accordingly. In this white paper few approaches are discussed for automating the post market ...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Scalability of the SAS/STAT HPGENSELECT High-Performance Analytical Procedure: A comparison with RevoScaleR

Effectively implementing high-performance analytics software solutions in the insurance industry Executive Summary At the Strata Conference on October 25, 2012, the research and planning division of a large insurance corporation (hereafter " insurer ") presented various methods that they used to model 150 million observations of insurance data. A summary of their presentation is available at: ...

متن کامل

Optimization Strategies for A/B Testing on HADOOP

In this work, we present a set of techniques that considerably improve the performance of executing concurrent MapReduce jobs. Our proposed solution relies on proper resource allocation for concurrent Hive jobs based on data dependency, inter-query optimization and modeling of Hadoop cluster load. To the best of our knowledge, this is the first work towards Hive/MapReduce job optimization which...

متن کامل

Tutorial: SQL-on-Hadoop Systems

Enterprises are increasingly using Apache Hadoop, more specifically HDFS, as a central repository for all their data; data coming from various sources, including operational systems, social media and the web, sensors and smart devices, as well as their applications. At the same time many enterprise data management tools (e.g. from SAP ERP and SAS to Tableau) rely on SQL and many enterprise user...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016